Comparison of Methods to Assess Similarity between Phrases
نویسندگان
چکیده
We study the problem of similarity between phrases. To do so, we study three similarity methods. The first one considers the commonalities and differences of the two phrases. The second one is an extension of the well-known Levenshtein-Damerau distance in a word oriented fashion. The third one considers the sequentiality of the phrases and is resistant to phrases with repeated words. Finally, we show an experimental evaluation of our methods in both English and Spanish corpora.
منابع مشابه
Combining Web-Based Searching with Latent Semantic Analysis to Discover Similarity Between Phrases
Determining semantic similarity between words, concepts and phrases is important in many areas within Artificial Intelligence. This includes the general areas of information retrieval, data mining, and natural language processing. Existing approaches have primarily focused on noun to noun synonym comparison. We propose a new technique for the comparison of general expressions that combines web ...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملSYDE 676 Project Report – Fall 2002 Web Document Clustering Using Phrase-based Document Similarity
Measuring the similarity between documents is an essential operation in text mining, especially document clustering. The traditional method of finding the similarity between documents has always been based on extracting individual words from the documents, and using heuristics to give weights to those features. Standard methods in data mining are then used to find the similarity between documen...
متن کاملSimilar Term Discovery using Web Search
We present an approach to the discovery of semantically similar terms that utilizes a web search engine as both a source for generating related terms and a tool for estimating the semantic similarity of terms. The system works by associating with each document in the search engine’s index a weighted term vector comprising those phrases that best describe the document’s subject matter. Related t...
متن کاملAlgorithm for Semantic Based Similarity Measure
In a document representation model the Semanti based Similarity Measure (SBSM), is proposed. This model combines phrases analysis as well as words analysis with the use of propbank notation as background knowledge to explore better ways of documents representation for clustering. The SBSM assigns semantic weights to both document words and phrases. The new weights reflect the semantic relatedne...
متن کامل